Current Issue : July-September Volume : 2022 Issue Number : 3 Articles : 5 Articles
In recent years, with the development of deep neural network becoming more and more mature, especially after the proposal of generative confrontation mechanism, academia has made many achievements in the research of image, video and text generation. Therefore, scholars began to use similar attempts in the research of music generation. Therefore, based on the existing theoretical technology and research work, this paper studies music production, and then proposes an intelligent music production technology based on generation confrontation mechanism to enrich the research in the field of computer music generation. This paper takes the music generation method based on generation countermeasure mechanism as the research topic, and mainly studies the following: after studying the existing music generation model based on generation countermeasure network, a time structure model for maintaining music coherence is proposed. In music generation, avoid manual input and ensure the interdependence between tracks. At the same time, this paper studies and implements the generation method of discrete music events based on multi track, including multi track correlation model and discrete processing. Thelakh MIDI data set is studied. On this basis, the lakh MIDI is pre-processed to obtain the LMD piano roll data set, which is used in the music generation experiment of MCT-GAN. When studying the multi track music generation based on generation countermeasure network, this paper studies and analyzes three models, and puts forward the multi track music generation method based on CT-GAN, which mainly improves the existing music generation model based on GAN. Finally, the generation results of MCT-GAN are compared with those of Muse-GAN, so as to reflect the improvement effect of MCT-GAN. Select 20 auditees to listen to the generated music and real music and distinguish them. Finally, analyze them according to the evaluation results. After evaluation, it is concluded that the research effect of multi track music generation based on CT-GAN is improved....
In order to better establish the Chinese-Korean translation system model, the deep transfer learning and the model system are tested and analyzed, and the following analysis results are obtained. By discussing the different adjustment mechanisms of deep transfer learning under MMD metric and Wasserstein metric, we can see that, in the MMD metric model, through analyzing datasets 1 and 2, the highest accuracy rate is 83.1% under multisource weight adjustment mechanism underMMDmetric and the lower accuracy rate is 62.7%under no weight adjustmentmechanism, and the accuracy rates of datasets 1 and 2 are higher than the average. Under Wasserstein metric, the accuracy of dataset 1 is 82.5% under multisource weight and 68.5% under no source weight, both of which are higher than the average. Three EEGNetmodels, EEGNet_0, EEGNet_1, and EEGNet_2, were established for comparative testing; according to the test results, it can be seen that EEGNet_1 has high accuracy and can be preferred for system establishment. By comparing the Chinese-Korean translation model with the blockchain model and the traditional translation model, it can be seen that when the translation sentences are 100 sentences, the average response time and peak traffic response time of the Chinese-Korean translation model are lower than those of the traditional translation model and the test conclusion is passed. When the test sentences are 1000 sentences, the average response time and peak traffic corresponding time of the Chinese-Korean translation model are still lower than those of the traditional method. Therefore, it can be seen that the efficiency and winning rate of the Chinese-Korean translation model are higher than those of the traditional translation system and meet the needs. According to the analysis of the performance test results of the translation system, it can be seen that the average response time and success rate of the Chinese and Korean translation system under different data are higher than those of the traditional translation system. When the test data are 500, the average response time of the translation system is 13ms and the accuracy rate is 100%. When the test data are 3000, the average response time is 99ms and the success rate is 99.6%. Therefore, the success rate of the translation system is basically above 99.6%, which is higher than that of the traditional translation system. In contrast, the Chinese-Korean translation system can improve the translation efficiency and accuracy and can be preferred....
Music is an important carrier of emotion and an indispensable factor in people’s daily life. With the rapid growth of digital music, people’s demand for music emotion analysis and retrieval is also increasing. With the rapid development of Internet technology, digital music has been derived continuously, and automatic recognition of music emotion has become the main research focus. For music, emotion is the most essential feature and the deepest inner feeling. Under the ubiquitous information environment, revealing the deep semantic information of multimodal information resources and providing users with integrated information services has important research and application value. In this paper, a multimodal fusion algorithm for music emotion analysis is proposed, and a dynamic model based on reinforcement learning is constructed to improve the analysis accuracy. The model dynamically adjusts the emotional analysis results by learning the user’s behavior, so as to realize the personalized customization of the user’s emotional preference....
In this work, we first propose a deep neural network (DNN) system for the automatic detection of speech in audio signals, otherwise known as voice activity detection (VAD). Several DNN types were investigated, including multilayer perceptrons (MLPs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), with the best performance being obtained for the latter. Additional postprocessing techniques, i.e., hysteretic thresholding, minimum duration filtering, and bilateral extension, were employed in order to boost performance. The systems were trained and tested using several data subsets of the CENSREC-1-C database, with different simulated ambient noise conditions, and additional testing was performed on a different CENSREC-1-C data subset containing actual ambient noise, as well as on a subset of the TIMIT database. An accuracy of up to 99.13% was obtained for the CENSREC-1-C datasets, and 97.60% for the TIMIT dataset. We proceed to show how the final VAD system can be adapted and employed within an utterance-level deceptive speech detection (DSD) processing pipeline. The best DSD performance is achieved by a novel hybrid CNN-MLP network leveraging a fusion of algorithmically and automatically extracted speech features, and reaches an unweighted accuracy (UA) of 63.7% on the RLDD database, and 62.4% on the RODeCAR database....
In order to achieve fast and accurate music technique recognition and enhancement for vocal music teaching, the paper proposed a music recognition method based on a combination of migration learning and CNN (convolutional neural network). Firstly, the most standard timbre vocal music is preprocessed by panning, flipping, rotating, and scaling and then manually classified by vocal technique features such as breathing method, articulation method, pronunciation method, and pitch region training. Then, based on the migration learning method, the weight parameters obtained from the convolutional model trained on the sound dataset CNN are migrated to the sound recognition, and the convolutional and pooling layers of the convolutional model are used as feature extraction layers, while the top layer is redesigned as a global average pooling layer and a Softmax output layer, and some of the convolutional layers are frozen during training. The experimental results show that the average test accuracy of the model is 86%, the training time is about 1/2 of the original model, and the model size is only 74.2 M. The F1 values of the model are 0.88, 0.80, 0.83, and 0.85 in four aspects, such as breathing method, exhaling method, articulation method, and phonetic region training, etc. The experimental results show that the method is efficient for voice and vocal music teaching recognition. The experimental results show that the method is efficient, effective, and transferable for voice and vocal music teaching research....
Loading....